{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install -r requirements.txt --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load .env file (Copy .env-sample to .env and update accordingly)\n",
    "\n",
    "Set the appropriate environment variables below:\n",
    "\n",
    "1. Use the [Document Layout Skill](https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-document-intelligence-layout) to convert PDFs and other compatible documents to markdown. It requires an [AI Services account](https://learn.microsoft.com/en-us/azure/search/cognitive-search-attach-cognitive-services) and a search service in a [supported region](https://learn.microsoft.com/en-us/azure/search/cognitive-search-attach-cognitive-services)\n",
    "   1. Specify `AZURE_AI_SERVICES_KEY` if using key-based authentication, and specify `AZURE_AI_SERVICES_ENDPOINT`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "from azure.identity import DefaultAzureCredential\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "import os\n",
    "\n",
    "load_dotenv(override=True) # take environment variables from .env.\n",
    "\n",
    "# Variables not used here do not need to be updated in your .env file\n",
    "endpoint = os.environ[\"AZURE_SEARCH_SERVICE_ENDPOINT\"]\n",
    "credential = AzureKeyCredential(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\")) if os.getenv(\"AZURE_SEARCH_ADMIN_KEY\") else DefaultAzureCredential()\n",
    "index_namespace = os.getenv(\"AZURE_SEARCH_INDEX_NAMESPACE\", \"index-and-chat\")\n",
    "blob_connection_string = os.environ[\"BLOB_CONNECTION_STRING\"]\n",
    "# search blob datasource connection string is optional - defaults to blob connection string\n",
    "# This field is only necessary if you are using MI to connect to the data source\n",
    "# https://learn.microsoft.com/azure/search/search-howto-indexing-azure-blob-storage#supported-credentials-and-connection-strings\n",
    "search_blob_connection_string = os.getenv(\"SEARCH_BLOB_DATASOURCE_CONNECTION_STRING\", blob_connection_string)\n",
    "blob_container_name = os.getenv(\"BLOB_CONTAINER_NAME\", \"index-and-chat\")\n",
    "azure_openai_endpoint = os.environ[\"AZURE_OPENAI_ENDPOINT\"]\n",
    "azure_openai_key = os.getenv(\"AZURE_OPENAI_KEY\")\n",
    "azure_openai_api_version = os.environ[\"AZURE_OPENAI_API_VERSION\"]\n",
    "azure_openai_embedding_deployment = os.getenv(\"AZURE_OPENAI_EMBEDDING_DEPLOYMENT\", \"text-embedding-3-large\")\n",
    "azure_openai_model_name = os.getenv(\"AZURE_OPENAI_EMBEDDING_MODEL_NAME\", \"text-embedding-3-large\")\n",
    "azure_openai_model_dimensions = int(os.getenv(\"AZURE_OPENAI_EMBEDDING_DIMENSIONS\", 3072))\n",
    "azure_openai_chat_deployment = os.environ[\"AZURE_OPENAI_CHAT_DEPLOYMENT\"]\n",
    "azure_ai_services_endpoint = os.environ[\"AZURE_AI_SERVICES_ENDPOINT\"]\n",
    "# This field is only necessary if you want to authenticate using a key to Azure AI Services\n",
    "azure_ai_services_key = os.getenv(\"AZURE_AI_SERVICES_KEY\", \"\")\n",
    "\n",
    "# Deepest nesting level in markdown that should be considered. See https://learn.microsoft.com/azure/search/cognitive-search-skill-document-intelligence-layout to learn more\n",
    "document_layout_depth = os.getenv(\"LAYOUT_MARKDOWN_HEADER_DEPTH\", \"h3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to Blob Storage and load documents\n",
    "\n",
    "Retrieve documents from Blob Storage. You can use the sample documents in the data/documents folder.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Setup sample data in index-and-chat\n"
     ]
    }
   ],
   "source": [
    "from azure.storage.blob import BlobServiceClient  \n",
    "import glob\n",
    "\n",
    "def upload_sample_documents(\n",
    "        blob_connection_string: str,\n",
    "        blob_container_name: str,\n",
    "        documents_directory: str,\n",
    "        # Set to false if you want to use credentials included in the blob connection string\n",
    "        # Otherwise your identity will be used as credentials\n",
    "        use_user_identity: bool = True,\n",
    "    ):\n",
    "        # Connect to Blob Storage\n",
    "        blob_service_client = BlobServiceClient.from_connection_string(logging_enable=True, conn_str=blob_connection_string, credential=DefaultAzureCredential() if use_user_identity else None)\n",
    "        container_client = blob_service_client.get_container_client(blob_container_name)\n",
    "        if not container_client.exists():\n",
    "            container_client.create_container()\n",
    "\n",
    "        pdf_files = glob.glob(os.path.join(documents_directory, '*.pdf'))\n",
    "        for file in pdf_files:\n",
    "            with open(file, \"rb\") as data:\n",
    "                name = os.path.basename(file)\n",
    "                if not container_client.get_blob_client(name).exists():\n",
    "                    container_client.upload_blob(name=name, data=data)\n",
    "\n",
    "upload_sample_documents(\n",
    "    blob_connection_string=blob_connection_string,\n",
    "    blob_container_name=blob_container_name,\n",
    "    documents_directory = os.path.join(\"..\", \"..\", \"..\", \"..\", \"data\", \"benefitdocs\")\n",
    ")\n",
    "\n",
    "print(f\"Setup sample data in {blob_container_name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a blob data source connector on Azure AI Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data source 'index-and-chat-blob' created or updated\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes import SearchIndexerClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SearchIndexerDataContainer,\n",
    "    SearchIndexerDataSourceConnection\n",
    ")\n",
    "from azure.search.documents.indexes.models import NativeBlobSoftDeleteDeletionDetectionPolicy\n",
    "\n",
    "# Create a data source \n",
    "indexer_client = SearchIndexerClient(endpoint, credential)\n",
    "container = SearchIndexerDataContainer(name=blob_container_name)\n",
    "data_source_connection = SearchIndexerDataSourceConnection(\n",
    "    name=f\"{index_namespace}-blob\",\n",
    "    type=\"azureblob\",\n",
    "    connection_string=search_blob_connection_string,\n",
    "    container=container,\n",
    "    data_deletion_detection_policy=NativeBlobSoftDeleteDeletionDetectionPolicy()\n",
    ")\n",
    "data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)\n",
    "\n",
    "print(f\"Data source '{data_source.name}' created or updated\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create search indexes\n",
    "\n",
    "Vector and nonvector content is stored in a search index.\n",
    "There's 1 index for the chunks and 1 index for the parent markdown documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "index-and-chat-parent created\n",
      "index-and-chat-child created\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes import SearchIndexClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SearchField,\n",
    "    SearchFieldDataType,\n",
    "    VectorSearch,\n",
    "    HnswAlgorithmConfiguration,\n",
    "    VectorSearchProfile,\n",
    "    AzureOpenAIVectorizer,\n",
    "    AzureOpenAIVectorizerParameters,\n",
    "    SemanticConfiguration,\n",
    "    SemanticSearch,\n",
    "    SemanticPrioritizedFields,\n",
    "    SemanticField,\n",
    "    SearchIndex,\n",
    "    BinaryQuantizationCompression\n",
    ")\n",
    "\n",
    "# Create a search index  \n",
    "index_client = SearchIndexClient(endpoint=endpoint, credential=credential)  \n",
    "child_index_fields = [  \n",
    "    SearchField(name=\"parent_id\", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True),  \n",
    "    SearchField(name=\"title\", type=SearchFieldDataType.String),  \n",
    "    SearchField(name=\"chunk_id\", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name=\"keyword\"),  \n",
    "    SearchField(name=\"chunk\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  \n",
    "    SearchField(name=\"vector\", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), stored=False, vector_search_dimensions=azure_openai_model_dimensions, vector_search_profile_name=\"myHnswProfile\"), \n",
    "    SearchField(name=\"header_1\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),\n",
    "    SearchField(name=\"header_2\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),\n",
    "    SearchField(name=\"header_3\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False) \n",
    "]\n",
    "\n",
    "parent_index_fields = [  \n",
    "    SearchField(name=\"parent_id\", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),  \n",
    "    SearchField(name=\"title\", type=SearchFieldDataType.String, searchable=True, filterable=True, sortable=False, facetable=True),  \n",
    "    SearchField(name=\"content\", type=SearchFieldDataType.String, searchable=True, filterable=False, sortable=False, facetable=False), \n",
    "    SearchField(name=\"metadata_storage_path\", type=SearchFieldDataType.String, filterable=True, sortable=False, facetable=True)\n",
    "]\n",
    "\n",
    "  \n",
    "# Configure the vector search configuration  \n",
    "vector_search = VectorSearch(  \n",
    "    algorithms=[  \n",
    "        HnswAlgorithmConfiguration(name=\"myHnsw\"),\n",
    "    ],  \n",
    "    profiles=[  \n",
    "        VectorSearchProfile(  \n",
    "            name=\"myHnswProfile\",  \n",
    "            algorithm_configuration_name=\"myHnsw\",  \n",
    "            vectorizer_name=\"myOpenAI\",  \n",
    "            compression_name=\"binaryQuantization\"\n",
    "        )\n",
    "    ],  \n",
    "    vectorizers=[  \n",
    "        AzureOpenAIVectorizer(  \n",
    "            vectorizer_name=\"myOpenAI\",  \n",
    "            kind=\"azureOpenAI\",  \n",
    "            parameters=AzureOpenAIVectorizerParameters(  \n",
    "                resource_url=azure_openai_endpoint,  \n",
    "                deployment_name=azure_openai_embedding_deployment,\n",
    "                model_name=azure_openai_model_name,\n",
    "                api_key=azure_openai_key,\n",
    "            ),\n",
    "        ),  \n",
    "    ],\n",
    "    compressions=[\n",
    "        BinaryQuantizationCompression(compression_name=\"binaryQuantization\")\n",
    "    ]\n",
    ")\n",
    "  \n",
    "semantic_config = SemanticConfiguration(  \n",
    "    name=\"my-semantic-config\",  \n",
    "    prioritized_fields=SemanticPrioritizedFields(  \n",
    "        content_fields=[SemanticField(field_name=\"chunk\")],\n",
    "        title_field=SemanticField(field_name=\"title\")\n",
    "    ),\n",
    ")\n",
    "  \n",
    "# Create the semantic search with the configuration  \n",
    "semantic_search = SemanticSearch(configurations=[semantic_config])  \n",
    "  \n",
    "# Create the search indexes\n",
    "parent_index = SearchIndex(name=f\"{index_namespace}-parent\", fields=parent_index_fields)  \n",
    "child_index = SearchIndex(name=f\"{index_namespace}-child\", fields=child_index_fields, vector_search=vector_search, semantic_search=semantic_search)\n",
    "result = index_client.create_or_update_index(parent_index)  \n",
    "print(f\"{result.name} created\")\n",
    "result = index_client.create_or_update_index(child_index)  \n",
    "print(f\"{result.name} created\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a skillset\n",
    "\n",
    "Skills drive integrated vectorization. [Text Split](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit) provides data chunking. [AzureOpenAIEmbedding](https://learn.microsoft.com/azure/search/cognitive-search-skill-azure-openai-embedding) handles calls to Azure OpenAI, using the connection information you provide in the environment variables. An [indexer projection](https://learn.microsoft.com/azure/search/index-projections-concept-intro) specifies secondary indexes used for chunked data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "index-and-chat-skillset created\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes.models import (\n",
    "    SplitSkill,\n",
    "    InputFieldMappingEntry,\n",
    "    OutputFieldMappingEntry,\n",
    "    AzureOpenAIEmbeddingSkill,\n",
    "    MergeSkill,\n",
    "    SearchIndexerIndexProjection,\n",
    "    SearchIndexerIndexProjectionSelector,\n",
    "    SearchIndexerIndexProjectionsParameters,\n",
    "    IndexProjectionMode,\n",
    "    SearchIndexerSkillset,\n",
    "    AIServicesAccountKey,\n",
    "    AIServicesAccountIdentity,\n",
    "    DocumentIntelligenceLayoutSkill\n",
    ")\n",
    "\n",
    "# Create a skillset name \n",
    "skillset_name = f\"{index_namespace}-skillset\"\n",
    "\n",
    "\n",
    "layout_skill = DocumentIntelligenceLayoutSkill(\n",
    "    description=\"Layout skill to read documents\",\n",
    "    context=\"/document\",\n",
    "    output_mode=\"oneToMany\",\n",
    "    markdown_header_depth=\"h3\",\n",
    "    inputs=[\n",
    "        InputFieldMappingEntry(name=\"file_data\", source=\"/document/file_data\")\n",
    "    ],\n",
    "    outputs=[\n",
    "        OutputFieldMappingEntry(name=\"markdown_document\", target_name=\"markdownDocument\")\n",
    "    ]\n",
    ")\n",
    "\n",
    "split_skill = SplitSkill(  \n",
    "    description=\"Split skill to chunk documents\",  \n",
    "    text_split_mode=\"pages\",  \n",
    "    context=\"/document/markdownDocument/*\",  \n",
    "    maximum_page_length=2000,  \n",
    "    page_overlap_length=500,  \n",
    "    inputs=[  \n",
    "        InputFieldMappingEntry(name=\"text\", source=\"/document/markdownDocument/*/content\"),  \n",
    "    ],  \n",
    "    outputs=[  \n",
    "        OutputFieldMappingEntry(name=\"textItems\", target_name=\"pages\")  \n",
    "    ]\n",
    ")\n",
    "\n",
    "merge_skill = MergeSkill(\n",
    "    description=\"Merge skill to get full document content\",\n",
    "    insert_pre_tag=\"\",\n",
    "    insert_post_tag=\"\\n\",\n",
    "    context=\"/document\",\n",
    "    inputs=[\n",
    "        InputFieldMappingEntry(name=\"itemsToInsert\", source=\"/document/markdownDocument/*/content\")\n",
    "    ],\n",
    "    outputs=[\n",
    "        OutputFieldMappingEntry(name=\"mergedText\", target_name=\"content\")\n",
    "    ]\n",
    ")\n",
    "\n",
    "embedding_skill = AzureOpenAIEmbeddingSkill(  \n",
    "    description=\"Skill to generate embeddings via Azure OpenAI\",  \n",
    "    context=\"/document/markdownDocument/*/pages/*\",  \n",
    "    resource_url=azure_openai_endpoint,  \n",
    "    deployment_name=azure_openai_embedding_deployment, \n",
    "    model_name=azure_openai_model_name,\n",
    "    dimensions=azure_openai_model_dimensions,\n",
    "    api_key=azure_openai_key,  \n",
    "    inputs=[  \n",
    "        InputFieldMappingEntry(name=\"text\", source=\"/document/markdownDocument/*/pages/*\"),  \n",
    "    ],  \n",
    "    outputs=[\n",
    "        OutputFieldMappingEntry(name=\"embedding\", target_name=\"vector\")  \n",
    "    ]\n",
    ")\n",
    "\n",
    "index_projections = SearchIndexerIndexProjection(  \n",
    "    selectors=[  \n",
    "        SearchIndexerIndexProjectionSelector(  \n",
    "            target_index_name=child_index.name,  \n",
    "            parent_key_field_name=\"parent_id\",  \n",
    "            source_context=\"/document/markdownDocument/*/pages/*\",  \n",
    "            mappings=[\n",
    "                InputFieldMappingEntry(name=\"chunk\", source=\"/document/markdownDocument/*/pages/*\"),  \n",
    "                InputFieldMappingEntry(name=\"title\", source=\"/document/metadata_storage_name\"),\n",
    "                InputFieldMappingEntry(name=\"vector\", source=\"/document/markdownDocument/*/pages/*/vector\"),\n",
    "                InputFieldMappingEntry(name=\"header_1\", source=\"/document/markdownDocument/*/sections/h1\"),\n",
    "                InputFieldMappingEntry(name=\"header_2\", source=\"/document/markdownDocument/*/sections/h2\"),\n",
    "                InputFieldMappingEntry(name=\"header_3\", source=\"/document/markdownDocument/*/sections/h3\"),\n",
    "            ]\n",
    "        )\n",
    "    ],  \n",
    "    parameters=SearchIndexerIndexProjectionsParameters(  \n",
    "        projection_mode=IndexProjectionMode.INCLUDE_INDEXING_PARENT_DOCUMENTS  \n",
    "    )  \n",
    ")\n",
    "\n",
    "skills = [layout_skill, split_skill, merge_skill, embedding_skill]\n",
    "\n",
    "skillset = SearchIndexerSkillset(  \n",
    "    name=skillset_name,  \n",
    "    description=\"Skillset to chunk documents and generating embeddings\",  \n",
    "    skills=skills,  \n",
    "    index_projection=index_projections,\n",
    "    cognitive_services_account=AIServicesAccountKey(key=azure_ai_services_key, subdomain_url=azure_ai_services_endpoint) if azure_ai_services_key else AIServicesAccountIdentity(identity=None, subdomain_url=azure_ai_services_endpoint)\n",
    ")\n",
    "\n",
    "client = SearchIndexerClient(endpoint, credential)  \n",
    "client.create_or_update_skillset(skillset)  \n",
    "print(f\"{skillset.name} created\")  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create an indexer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " index-and-chat-indexer is created and running. If queries return no results, please wait a bit and try again.\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes.models import (\n",
    "    SearchIndexer,\n",
    "    IndexingParameters,\n",
    "    IndexingParametersConfiguration,\n",
    "    FieldMapping\n",
    ")\n",
    "\n",
    "# Create an indexer  \n",
    "indexer_name = f\"{index_namespace}-indexer\"  \n",
    "\n",
    "indexer_parameters = IndexingParameters(\n",
    "    configuration=IndexingParametersConfiguration(\n",
    "        allow_skillset_to_read_file_data=True,\n",
    "        data_to_extract=\"storageMetadata\",\n",
    "        query_timeout=None))\n",
    "\n",
    "indexer = SearchIndexer(  \n",
    "    name=indexer_name,  \n",
    "    description=\"Indexer to index documents and generate embeddings\",  \n",
    "    skillset_name=skillset_name,  \n",
    "    target_index_name=parent_index.name,  \n",
    "    data_source_name=data_source.name,\n",
    "    parameters=indexer_parameters,\n",
    "    field_mappings=[\n",
    "        FieldMapping(source_field_name=\"metadata_storage_name\", target_field_name=\"title\"),\n",
    "    ],\n",
    "    output_field_mappings=[\n",
    "        FieldMapping(source_field_name=\"/document/content\", target_field_name=\"content\"),\n",
    "    ]\n",
    ")  \n",
    "\n",
    "indexer_client = SearchIndexerClient(endpoint, credential)  \n",
    "indexer_result = indexer_client.create_or_update_indexer(indexer)  \n",
    "  \n",
    "# Run the indexer  \n",
    "indexer_client.run_indexer(indexer_name)  \n",
    "print(f' {indexer_name} is created and running. If queries return no results, please wait a bit and try again.')  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Chat with your data\n",
    "\n",
    "Below are 2 strategies you can use to chat with your data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import List\n",
    "\n",
    "from azure.search.documents.aio import SearchClient\n",
    "from azure.search.documents.models import VectorizableTextQuery\n",
    "from openai import AsyncAzureOpenAI\n",
    "from openai.types.chat import ChatCompletion, ChatCompletionSystemMessageParam, ChatCompletionUserMessageParam, ChatCompletionMessage, ChatCompletionMessageParam\n",
    "from azure.identity.aio import DefaultAzureCredential, get_bearer_token_provider\n",
    "from pydantic import BaseModel, Field\n",
    "from typing import Optional\n",
    "\n",
    "token_provider = get_bearer_token_provider(DefaultAzureCredential(), \"https://cognitiveservices.azure.com/.default\")\n",
    "parent_index_client = SearchClient(endpoint=endpoint, index_name=parent_index.name, credential=credential)\n",
    "child_index_client = SearchClient(endpoint=endpoint, index_name=child_index.name, credential=credential)\n",
    "\n",
    "client = AsyncAzureOpenAI(\n",
    "    api_version=azure_openai_api_version,\n",
    "    azure_endpoint=azure_openai_endpoint,\n",
    "    api_key=azure_openai_key,\n",
    "    azure_ad_token_provider=token_provider if not azure_openai_key else None\n",
    ")\n",
    "\n",
    "# This code can be customized to extract different entities from the query based on your requirements.\n",
    "# See https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs for more information\n",
    "# NOTE: Updating the tool definition with specific examples related to your data will help improve the accuracy.\n",
    "class ExtractTitles(BaseModel):\n",
    "    \"\"\"Extracts titles from a query to use in a search filter.\"\"\"\n",
    "    titles: Optional[List[str]] = Field(..., description=\"List of titles extracted from the query. Complete file names are considered titles. If there are no titles in the query, provide an empty list. For example, in the query 'Find the report on sales and the summary of the meeting using 'myreport.pdf', the titles would be ['myreport.pdf']. If no titles are found, return an empty list.\")\n",
    "\n",
    "async def extract_titles(query: str) -> List[str]:\n",
    "   response: ChatCompletion = await client.beta.chat.completions.parse(\n",
    "      model=azure_openai_chat_deployment,\n",
    "      messages=[\n",
    "         ChatCompletionSystemMessageParam(role=\"system\", content=\"You are a helpful assistant that extracts titles from user queries.\"),\n",
    "         ChatCompletionUserMessageParam(role=\"user\", content=f\"Extract the titles from the following query: '{query}'\"),\n",
    "      ],\n",
    "      response_format=ExtractTitles\n",
    "   )\n",
    "\n",
    "   return response.choices[0].message.parsed.titles\n",
    "\n",
    "async def answer_query_documents(query: str, chat_history: Optional[List[ChatCompletionMessageParam]] = [], include_parent_documents: bool = True, include_child_documents: bool = True) -> List[ChatCompletionMessageParam]:\n",
    "   if not include_parent_documents and not include_child_documents:\n",
    "      raise ValueError(\"At least one of include_parent_documents or include_child_documents must be True.\")\n",
    "\n",
    "   formatted_results = \"\"\n",
    "   titles = []\n",
    "\n",
    "   if include_parent_documents:\n",
    "      # Step 1: Extract titles from the query\n",
    "      titles = await extract_titles(query)\n",
    "\n",
    "      # Step 2: If we found titles, include them in the query of the parent index\n",
    "      if titles:\n",
    "         results = await parent_index_client.search(\n",
    "            filter=\" or \".join([f\"title eq '{title}'\" for title in titles]),  # Filter by titles, must be exact match\n",
    "            top=len(titles),  # Limit to top results\n",
    "            select=[\"title\", \"content\"])\n",
    "         formatted_results = \"\\n\".join([f\"{result['title']}\\n{result['content']}\" async for result in results])\n",
    "\n",
    "\n",
    "   if len(formatted_results) == 0 and include_child_documents:\n",
    "      # If no titles were found or no results were returned, search the child index with a vectorized query\n",
    "      results = await child_index_client.search(\n",
    "         search_text=query,\n",
    "         vector_queries=[VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"vector\")],  # Use vector search with k nearest neighbors\n",
    "         query_type=\"semantic\",\n",
    "         semantic_configuration_name=child_index.semantic_search.configurations[0].name,  # Use the semantic configuration created earlier\n",
    "         top=5,  # Limit to top 5 results\n",
    "         select=[\"title\", \"chunk\"]\n",
    "      )\n",
    "\n",
    "      # Format the results from the child index\n",
    "      formatted_results = \"\\n\".join([f\"{result['title']}\\n{result['chunk']}\" async for result in results])\n",
    "\n",
    "   assistant_system_message = \"You are a helpful assistant that answers queries. You do not have access to the internet, but you can use documents in the chat history to answer the question. If the documents do not contain the answer, say 'I don't know'. You must cite your answer with the titles of the documents used. If you are unsure, say 'I don't know'.\"\n",
    "   query_message = f\"Answer the following query: {query}\\nRelevant documents: {formatted_results}\"\n",
    "   messages = chat_history + [ ChatCompletionUserMessageParam(role=\"user\", content=query_message) ] if chat_history else [\n",
    "      ChatCompletionSystemMessageParam(role=\"system\", content=assistant_system_message),\n",
    "      ChatCompletionUserMessageParam(role=\"user\", content=query_message),\n",
    "   ] \n",
    "   \n",
    "   response: ChatCompletion = await client.chat.completions.create(\n",
    "      model=azure_openai_chat_deployment,\n",
    "      messages=messages\n",
    "   )\n",
    "\n",
    "   message: ChatCompletionMessage = response.choices[0].message\n",
    "   if titles:\n",
    "      message.content += f\"\\nTitles used for the answer: {', '.join(titles)}\"\n",
    "\n",
    "   return messages + [message]\n",
    "\n",
    "\n",
    "def get_last_answer(chat_history: List[ChatCompletionMessageParam]) -> Optional[str]:\n",
    "    \"\"\"Prints the last assistant message from the chat history.\"\"\"\n",
    "    if chat_history and chat_history[-1].role == \"assistant\":\n",
    "        return chat_history[-1].content\n",
    "    \n",
    "    return None\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Different Strategies\n",
    "\n",
    "1. If you include the title of a PDF in your question, the search is automatically filtered to only include those PDFs. You can disable this behavior by passing `include_parent_documents=False` to `answer_query_documents`\n",
    "2. If you don't include any titles, normal chunk search is used. You can disable this behavior by passing `include_child_documents=False` to `answer_query_documents`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The health insurance policies offered are:\\n\\n1. **Northwind Health Plus**: A comprehensive plan covering medical, vision, and dental services, prescription drug coverage (including generic, brand-name, and specialty drugs), mental health and substance abuse, preventive care, emergency services (both in-network and out-of-network), and more. It also includes routine physicals, well-child visits, immunizations, vision exams, glasses, contact lenses, dental exams, cleanings, fillings, hospital stays, doctor visits, lab tests, and X-rays.\\n\\n2. **Northwind Standard**: A basic plan covering medical, vision, and dental services, prescription drug coverage (only generic and brand-name drugs), and preventive care. It does not cover emergency services, mental health and substance abuse, or out-of-network services. It includes routine physicals, well-child visits, immunizations, vision exams, glasses, doctor visits, and lab tests.\\n\\nThese plans have different costs deducted from each paycheck, with Northwind Health Plus generally having a higher cost than Northwind Standard.\\n\\n(\"Benefit_Options.pdf\")\\nTitles used for the answer: Benefit_Options.pdf'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat_history = await answer_query_documents(\"Use Benefit_Options.pdf. What are the health insurance policies?\")\n",
    "get_last_answer(chat_history)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The new health insurance policy, Northwind Health Plus, provides comprehensive coverage for medical, vision, and dental services. Its benefits include:\\n\\n1. **Prescription Drug Coverage**: Northwind Health Plus covers a wide range of prescription drugs, including generic, brand-name, and specialty drugs.\\n\\n2. **Preventive Care**: This plan covers routine physicals, well-child visits, immunizations, mammograms, colonoscopies, and other cancer screenings.\\n\\n3. **Mental Health and Substance Abuse Coverage**: The plan includes coverage for mental health and substance abuse services.\\n\\n4. **Emergency Services**: Coverage is provided for both in-network and out-of-network emergency services.\\n\\n5. **Vision and Dental Services**: Northwind Health Plus covers vision exams, glasses, contact lenses, dental exams, cleanings, and fillings.\\n\\n6. **Out-of-Network Services**: Unlike Northwind Standard, Northwind Health Plus includes coverage for out-of-network services.\\n\\nThese benefits make Northwind Health Plus more comprehensive compared to the Northwind Standard plan. \\n\\n*Source: Benefit_Options.pdf*'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat_history = await answer_query_documents(\"What are the benefits of the new health insurance policy?\")\n",
    "get_last_answer(chat_history)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
